763 research outputs found
Generalized regular expressionsâA language for synthesis of programs with branching in loops
AbstractRegular expressions are generalized to the effect that, besides letters from a finite alphabet, they may also contain natural numbers. Within the framework of these generalized expressions the task of the inductive synthesis of programs from its sample run is formalized. Special automata recognizing the sets defined by generalized expressions are introduced, and their equivalence problem is shown to be recursively solvable. The set-theoretic properties of the sets defined by generalized expressions are also studied
Prediction of gene expression in embryonic structures of Drosophila melanogaster.
Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms
A call for BMC Research Notes contributions promoting best practice in data standardization, sharing and publication
BMC Research Notes aims to ensure that data files underlying published articles are made available in standard, reusable formats, and the journal is calling for contributions from the scientific community to achieve this goal. Educational Data Notes included in this special series should describe a domain-specific data standard and provide an example data set with the article, or a link to data that are permanently hosted elsewhere. The contributions should also provide some evidence of the data standard's application and preparation guidance that could be used by others wishing to conduct similar experiments. The journal is also keen to receive contributions on broader aspects of scientific data sharing, archiving, and open data
An integrative clustering approach combining particle swarm optimization and formal concept analysis
MageCometâweb application for harmonizing existing large-scale experiment descriptions
Motivation: Meta-analysis of large gene expression datasets obtained from public repositories requires consistently annotated data. Curation of such experiments, however, is an expert activity which involves repetitive manipulation of text. Existing tools for automated curation are few, which bottleneck the analysis pipeline
Submission of Microarray Data to Public Repositories
The Microarray Gene Expression Data Society believe that the time is right for journals to require that microarray data be deposited in public repositories, as a condition for publicatio
Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow.
Complete annotation of the human genome is indispensable for medical research. The GENCODE consortium strives to provide this, augmenting computational and experimental evidence with manual annotation. The rapidly developing field of proteogenomics provides evidence for the translation of genes into proteins and can be used to discover and refine gene models. However, for both the proteomics and annotation groups, there is a lack of guidelines for integrating this data. Here we report a stringent workflow for the interpretation of proteogenomic data that could be used by the annotation community to interpret novel proteogenomic evidence. Based on reprocessing of three large-scale publicly available human data sets, we show that a conservative approach, using stringent filtering is required to generate valid identifications. Evidence has been found supporting 16 novel protein-coding genes being added to GENCODE. Despite this many peptide identifications in pseudogenes cannot be annotated due to the absence of orthogonal supporting evidence
Genome Biol.
With genome analysis expanding from the study of genes to the study of gene regulation, 'regulatory genomics' utilizes sequence information, evolution and functional genomics measurements to unravel how regulatory information is encoded in the genome
- âŠ